A Web Tool for Building Parallel Corpora of Spoken and Sign Languages

نویسندگان

  • Alex Becker
  • Fabio Kepler
  • Sara Candeias
چکیده

In this paper we describe our work in building an online tool for manually annotating texts in any spoken language with SignWriting in any sign language. The existence of such tool will allow the creation of parallel corpora between spoken and sign languages that can be used to bootstrap the creation of efficient tools for the Deaf community. As an example, a parallel corpus between English and American Sign Language could be used for training Machine Learning models for automatic translation between the two languages. Clearly, this kind of tool must be designed in a way that it eases the task of human annotators, not only by being easy to use, but also by giving smart suggestions as the annotation progresses, in order to save time and effort. By building a collaborative, online, easy to use annotation tool for building parallel corpora between spoken and sign languages we aim at helping the development of proper resources for sign languages that can then be used in state-of-the-art models currently used in tools for spoken languages. There are several issues and difficulties in creating this kind of resource, and our presented tool already deals with some of them, like adequate text representation of a sign and many to many alignments between words and signs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted..

More and more sign languages nowadays are now documented by large scale digital corpora. But exploiting sign language (SL) corpus data remains subject to the time consuming and expensive manual task of annotating. In this paper, we present an ongoing research that aims at testing a new approach to better mine SL data. It relies on the methodology of corpus-based contrastive linguistics, exploit...

متن کامل

Spoken to Spoken vs. Spoken to Written: Corpus Approach to Exploring Interpreting and Subtitling

issue of Polibits includes a selection of papers related to the topic of processing of semantic information. Processing of semantic information involves usage of methods and technologies that help machines to understand the meaning of information. These methods automatically perform analysis, extraction, generation, interpretation, and annotation of information contained on the Web, corpus, nat...

متن کامل

Exploiting the Web as Parallel Corpora for Cross- Language Information Retrieval

The expansion of the Web creates more requirements for Cross-Language Information Retrieval (CLIR). Query translation is the key problem. Previous studies have shown that query translation can be done by exploiting a large set of parallel texts. However, the problem arisen is the unavailability of large parallel corpora for many languages. In this paper, we describe a mining system that automat...

متن کامل

Constructing Parallel Corpus from Movie Subtitles

This paper describes a methodology for constructing aligned German-Chinese corpora from movie subtitles. The corpora will be used to train a special machine translation system with intention to automatically translate the subtitles between German and Chinese. Since the common length-based algorithm for alignment shows weakness on short spoken sentences, especially on those from different langua...

متن کامل

Polyphraz: a tool for the quantitative and subjective evaluation of parallel corpora

The PolyphraZ tool is under construction in the framework of the TraCorpEx project (Translation of Corpora of Examples), for the management of parallel multilingual corpora (coding, format, correspondence). It is a software platform allowing the preparation and handling of parallel corpora (languages, codings...), parallel presentation, and addition of new languages to existing corpora by calli...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016